Prediction with Missing Inputs

نویسنده

  • Warren S. Sarle
چکیده

For the purposes of this paper, data mining involves: Predictive modeling (Estimating parameters is not of major interest) Nonnormal data with nonlinear relationships Large data sets This paper will not cover issues regarding small data sets, such as Bayesian predictive distributions, or tree-based models, for which specialized methods are available for handling missing data. Missing data are a major concern in data mining, both because a substantial proportion of the data may be missing and because predictions must be made for cases with missing inputs. The terminology used in data mining comes from a variety of fields, including statistics, artificial intelligence, and engineering. The terms training, estimation, and model fitting will be used interchangeably. A target is what statisticians call a dependent variable, and an input is an independent variable. Generalization is the ability of a model to make good predictions for data not used during training. The main purpose of data mining is to get good generalization. The purpose of this paper is to see how to get good generalization with missing inputs. Main points: What’s true for linear regression with multivariate normal data often is not true for nonlinear models (such as neural nets) with nonnormal data. What’s true for estimation often is not true for prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DEA with Missing Data: An Interval Data Assignment Approach

In the classical data envelopment analysis (DEA) models, inputs and outputs are assumed as known variables, and these models cannot deal with unknown amounts of variables directly. In recent years, there are few researches on handling missing data. This paper suggests a new interval based approach to apply missing data, which is the modified version of Kousmanen (2009) approach. First, the prop...

متن کامل

Scale Efficient Targets in Production Systems With Two-stage Structure Under Imprecise Data Assumption

Traditional data envelopment analysis (DEA) models evaluate two-stage decision making unit (DMU) as a black box and neglect the connectivity may exist among the stages. This paper looks inside the system by considering the intermediate activities between the stages where the first stage uses inputs to produce outputs which are the inputs to the second stage along with its own inputs. Additional...

متن کامل

Presenting a New Model for Bank’s Supply Chain Performance Evaluating with DEA Solution Approach

Data Envelopment Analysis (DEA) is a method for measuring the efficiency of peer decision making units (DMUs) with multiple inputs and outputs. The traditional DEA treats decision making units under evaluation as black boxes and calculates their efficiencies with first inputs and last outputs. This carries the notion of missing some intermediate measures in the process of changing the inputs to...

متن کامل

Application of Linear Regression and Artificial NeuralNetwork for Broiler Chicken Growth Performance Prediction

This study was conducted to investigate the prediction of growth performance using linear regression and artificial neural network (ANN) in broiler chicken. Artificial neural networks (ANNs) are powerful tools for modeling systems in a wide range of applications. The ANN model with a back propagation algorithm successfully learned the relationship between the inputs of metabolizable energy (kca...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998